Text Mining for Documents Annotation and Ontology Support
نویسندگان
چکیده
This paper presents a survey of basic concepts in the area of text data mining and some of the methods used in order to elicit useful knowledge from collections of textual data. Three different text data mining techniques (clustering/visualisation, association rules and classification models) are analysed and its exploitation possibilities within the Webocracy project are showed. Clustering and association rules discovery are well suited as supporting tools for ontology management. Classification models are used for automatic documents annotation.
منابع مشابه
Linguistic Annotation for the Semantic Web
Establishing the semantic web on a large scale implies the widespread annotation of web documents with ontology-based knowledge markup. For this purpose, tools have been developed that allow for semi-automatic annotation of web documents with ontology-based metadata. However, given that a large number of web documents consist either fully or at least partially of free text, language technology ...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملDocument indexing for automatic semantic annotation support
Nowadays, capturing the knowledge in ontological structures is one of the primary focuses of the knowledge management research. To exploit the knowledge from the vast quantity of existing unstructured texts available in natural languages in ontologies, tools for automatic semantic annotation (ASA) are heavily needed. In this paper, we present an approach to ASA and a method for documents conten...
متن کاملBiomedical Document Triage Based on Figure Classification
The annotation task in model organism databases is to assign attributes, such as Gene Ontology (GO) codes, to biological entities, such as genes and proteins based on the evidence found in documents or other resources. Document triage precedes an annotation task; it identifies relevant documents that can support the annotation process. Annotation in organism databases involves manual efforts of...
متن کاملUsing Term-Matching Algorithms for the Annotation of Geo-services
This paper presents an approach for automating semantic annotation within service-oriented architectures that provide interfaces to databases of spatial-information objects. The automation of the annotation process facilitates the transition from the current state-of-the-art architectures towards semantically-enabled architectures. We see the annotation process as the task of matching an arbitr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003